Skip to content

Record: Two-Level Dirichlet Posterior + Phrase Cache — 0.11556 BPB (3-seed)#948

Open
dentity007 wants to merge 2 commits intoopenai:mainfrom
NathanMaine:submission/nathanmaine-dirichlet-ngram
Open

Record: Two-Level Dirichlet Posterior + Phrase Cache — 0.11556 BPB (3-seed)#948
dentity007 wants to merge 2 commits intoopenai:mainfrom
NathanMaine:submission/nathanmaine-dirichlet-ngram

Conversation

@dentity007
Copy link
Copy Markdown

Two-Level Dirichlet Posterior + Per-Order OBCL + Phrase Cache

val_bpb: 0.11556 (3-seed mean, std 0.0000057) | ~15.1 MB | 8xH100 SXM

3-seed validation

Seed Val BPB Eval Time Artifact bytes
1337 0.11555061 419s 15,077,877
42 0.11556435 370s 15,077,877
2025 0.11555875 359s 15,077,877
Mean 0.11556 (std 0.0000057)

Techniques

  • Two-level Dirichlet-Multinomial posterior mixing (neural → n-gram → phrase)
  • Per-order OBCL concentrations: [50.0, 50.0, 6.95, 2.98, 2.05, 2.05, 2.05, 1.86, 1.86, 1.86, 1.86, 1.86, 1.86, 1.86]
  • Phrase suffix matching at probe lengths [20, 16] with Dirichlet concentration 1.0
  • 15-gram backoff (orders 2-15, 4M hash buckets)
  • Complementary training (alpha=0.50, orders 2-5)
  • EBLS architecture (3 shared x 3 loops + 2 unique = 11L)
  • GPTQ int6 + LZMA compression
  • EMA 0.997 + SWA weight averaging

Compliance

  • Training: 560s on 8xH100 (within 600s)
  • Eval: 419s worst case (within 600s)
  • Artifact: 15,077,877 bytes (within 16,000,000)
  • All caches strictly backward-looking (causal)
  • Score-first evaluation
  • No training data accessed during evaluation

Credits

Built on the community's work:

@dentity007 dentity007 changed the title Two-Level Dirichlet Posterior + Phrase Cache — 0.11556 BPB (3-seed) Record: Two-Level Dirichlet Posterior + Phrase Cache — 0.11556 BPB (3-seed) Mar 27, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant